Recovering gestures from speech signals: a preliminary study for nasal vowels
نویسندگان
چکیده
For nasal vowels, a gesture as simple as the lowering of the velum produces complex acoustic spectra. However, we still find a relative simplicity in the perceptual space; nasality is perceived easily. In this preliminary study, we use statistic method to recover the gesture of the velum. In order to reduce the extreme variability of nasal vowels, we introduced a simulation based on Maeda’s model instead of using a natural speech signal. In previous studies, nasality is supposed to increase either with size of the nasal area or with the area ratio between nasal and oral tracts at the extremity of the velum. In this work, both types of data are considered and analyzed with linear and non-linear tools. Finally, statistic inference is described and results are given for various areas of the nasal tract entrance and for various area ratios. The results show that velar port area is correctly estimated for small values while area ratio is a better parameter when velar port area increases. 1. ABOUT NASAL VOWELS About 20 % of the UPSID languages have a phonemic contrast between oral vowels and nasal vowels. In French nasality is the only distinctive feature between the words [pε] paix (peace) and [pε~] pain (bread). Nasal vowels often derive from nasal consonant assimilation. This phenomenon also exists in languages with nasal consonants, more or less pronounced according to the context. Such vowels are called nasalized vowels in order to discriminate from nasal vowels. Nasalization, controlled or not, is a widespread feature. On the articulatory level, vowel nasalization is produced by the lowering of the velum. This simple gesture connects the nasal fossa to the oral tract. The nasal fossa are quite complex and very different from one to another (Dang & Honda, 1994), but they are fixed for a given person. Acoustic consequences of this gesture depend on each person and also on the position of the velum. Many studies tried to find the acoustic correlates of nasality. Most of these agrees with a relative weakness of the first formant and proposes other secondary correlates. However detailed characteristics of the spectra of nasal vowels vary with the frequency of the first oral resonance and the magnitude of nasal coupling. In spite of such acoustic complexity, listeners give similar vowels nasality judgements, regardless of the phonological status of nasalization in their native language (Beddor & Strange, 1982). Articulatory-to-acoustic relationship is quite complex and still not well understood for nasal vowels. Until now, there has been no corpus of velar port size measurements with simultaneous speech signal. The velum is an internal organ difficult of access. Nowadays IRM can very useful static vocal tract data. Furthermore, articulatory-acoustic inversion projects such as SPEECHMAPS have provided tools, data and encouraging results on oral vowels and fricative consonants. Can inversion techniques estimate the position of the velum or its evolution during nasal vowel production ? In this preliminary study, we tried to answer this question in simplified and controlled conditions. Instead of natural speech signal, vocal tract simulation produces acoustic transfer functions. First, the production model and corresponding database are presented. Then database analyses are given before explaining in the third section the inversion technique applied and the results obtained. 2. PRODUCTION MODEL OF NASAL VOWELS 2.1. Articulatory Model Usually articulatory models proposed by Maeda (1988) and Mermelstein (1973) consider only the oral tract. The model used in this work is Maeda’s model based on X-ray images of a speaker (Patricia Barbier) pronouncing French sentences. Eight parameters regulate the jaw and tongue positions, the opening and the protrusion of the lips, the larynx length. A ninth parameter vm is introduced to represent the velum position. Then, the articulatory model calculates the vocal tract area function whatever the value of the parameter vm may be, as if the velar port was closed. In Maeda’s model, all parameters are normalised, centred on zero which corresponds to the mean value of all the positions registered on X-ray images and with a standard deviation 1. For normal distributions, variations between –3 and 3 are supposed to cover most of the cases. A more detailed study of the parameter vm is needed to check that it has a normal distribution. The distribution of all the values of parameter vm measured on X-ray images brings to the fore that this parameter does not follow such a distribution. The lowest values are around –1 and they are obtained during plosive consonant production. When parameter vm is between –1 and 0, the velar port is closed. The highest values are close to 3 or 4 and correspond to the rest position, nasal vowels or consonants. The default value 0 is used for the oral vowel production, the velar port area is null. When parameter vm increases, the velum lowers and allows the air to go through the nasal fossa. The velum is 1 cm thick and is ending by the uvula. Both outlines of X-ray image and midsagittal section obtained by the model are shown in Figure 1. So this articulatory model enables to get both oral area function and velar port area, areas being calculated from midsaggital sections. As data about Patricia Barbier’s nasal Figure 1 : Example of the outlines of a X-ray image and the outlines of the midsaggital section associated. fossa are not available, area functions of nasal tract and sinuses are given by Feng and Castelli (1996). A set of nine parameters controls configurations of the vocal tract and corresponding area function. As this study only concerns vowels, constraints are imposed on the constriction area, the place of constriction and the lips area. 2.2 Construction of the Database Traditionally simulation studies of nasal vowels compare transfer functions obtained with different velar port area, called hereafter nasal area An. The nasal area variations when parameter vm increases are studied for various configurations. A bijective relation exists between the parameter vm and the nasal area An except for high values of vm where there is a saturation which depends on the tongue position. The velum lowers until it rests on the tongue and stays in that position even if parameter vm still increases. Without taking into account such a saturation, the seven values corresponding to the nasal area values of 0, 0.2, 0.4, 0.8, 1.8, 2.2, 2.4, 2.6 cm2 are determined. For a null value of parameter vm, 1000 configurations are at random and transfer functions are computed for each of the seven values of parameter vm. This database contains thus 7000 transfer functions. However we cannot ignore nasal area saturation; for the same high value of parameter vm, the nasal areas An can be very different because of the tongue position. To avoid such vowel dependence, Feng proposed to use another parameter: the area ratio d (Feng & Castelli, 1996).
منابع مشابه
Speech rate effects on european portuguese nasal vowels
This paper presents new temporal information regarding the production of European Portuguese (EP) nasal vowels, based on new EMMA data. The influence of speech rate on duration of velum gestures and their coordination with consonantic and glottal gestures were analyzed. As information on relative speed of articulators is scarce, the parameter stiffness for the nasal gestures was also calculated...
متن کاملThe role of the pharynx and tongue in enhancement of vowel nasalization: a real-time MRI investigation of French nasal vowels
Complexity in the acoustics of nasal vowels has long been acknowledged but complexity in their articulation has received less attention. A growing body of research suggests that velopharyngeal (VP) opening is complemented by other articulatory gestures which may enhance or counteract the acoustic outcomes of VP opening. In this paper we consider the role of pharyngeal aperture and lingual posit...
متن کاملMeasurements of articulatory variation and communicative signals in expressive speech
This paper describes a method for acquiring data for facial movement analysis and implementation in an animated talking head. We will also show preliminary data on how a number of articulatory and facial parameters for some Swedish vowels vary under the influence of expressiveness in speech and gestures. Primarily we have been concerned in expressive gestures and emotions conveying information ...
متن کاملMeasuring oral and nasal airflow in production of Chinese plosive
This study reports a new integrated device for measuring oral and nasal airflow with lossless speech recording with preliminary results on Chinese plosives. An acoustically transparent fiber-made mask acts both as a material for airflow resistance and a support for partitioning oral and nasal chambers. Two pressure sensors placed directly on the mask measure air pressure signals that correspond...
متن کاملMental Timeline in Persian Speakers’ Co-speech Gestures based on Lakoff and Johnson’s Conceptual Metaphor Theory
One of the introduced conceptual metaphors is the metaphor of "time as space". Time as an abstract concept is conceptualized by a concrete concept like space. This conceptualization of time is also reflected in co-speech gestures. In this research, we try to find out what dimension and direction the mental timeline has in co-speech gestures and under the influence of which one of the metaphoric...
متن کامل